Goto

Collaborating Authors

 test and validation dataset


Bioptic -- A Target-Agnostic Potency-Based Small Molecules Search Engine

Vinogradov, Vlad, Izmailov, Ivan, Steshin, Simon, Nguyen, Kong T.

arXiv.org Artificial Intelligence

Recent successes in virtual screening have been made possible by large models and extensive chemical libraries. However, combining these elements is challenging: the larger the model, the more expensive it is to run, making ultra-large libraries unfeasible. To address this, we developed a target-agnostic, efficacy-based molecule search model, which allows us to find structurally dissimilar molecules with similar biological activities. We used the best practices to design fast retrieval system, based on processor-optimized SIMD instructions, enabling us to screen the ultra-large 40B Enamine REAL library with 100\% recall rate. We extensively benchmarked our model and several state-of-the-art models for both speed performance and retrieval quality of novel molecules.


Do you really know the difference between Test and Validation Datasets?

#artificialintelligence

Many people don't really know the difference between test and validation. In Machine Learning these two words are often used improperly, but they indicate two very different things. Even literature sometimes reverses the meaning of these terms. When training a model the dataset is usually divided into a train set, a validation set and a test set but…why are the last two sets needed? Keep reading, you will find your answers.


What is the Difference Between Test and Validation Datasets? - Machine Learning Mastery

#artificialintelligence

We can see the interchangeableness directly in Kuhn and Johnson's excellent text "Applied Predictive Modeling". In this example, they are clear to point out that the final model evaluation must be performed on a held out dataset that has not been used prior, either for training the model or tuning the model parameters. Ideally, the model should be evaluated on samples that were not used to build or fine-tune the model, so that they provide an unbiased sense of model effectiveness. When a large amount of data is at hand, a set of samples can be set aside to evaluate the final model. The "training" data set is the general term for the samples used to create the model, while the "test" or "validation" data set is used to qualify performance.